Skip to content

fix(examples): handle NaNs in 14-transfer-learning air dataset#3116

Open
jbbqqf wants to merge 1 commit into
unit8co:masterfrom
jbbqqf:fix/2752-transfer-learning-nan
Open

fix(examples): handle NaNs in 14-transfer-learning air dataset#3116
jbbqqf wants to merge 1 commit into
unit8co:masterfrom
jbbqqf:fix/2752-transfer-learning-nan

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Fixes #2752.

Summary

examples/14-transfer-learning.ipynb fails on the air dataset with ValueError: ... NaN ... errors during model training. The raw carrier_passengers.csv contains gaps that survive longest_contiguous_slice() (e.g. zero-rows the slice helper does not flag as missing), so the resulting series carry NaNs into model training. As @rafmacalaba reported in #2752, the cleanest workaround is to call fill_missing_values(...) after the slice; @dennisbader acknowledged the notebook needs refactoring (it's one of the few examples that's not run in CI because it's slow).

This PR applies that fix in the smallest possible patch:

  • Import fill_missing_values from darts.utils.missing_values in the imports cell.
  • Inside load_air(), call fill_missing_values(series, fill="auto") immediately after longest_contiguous_slice(). An inline comment cites the issue so a future reader doesn't have to dig.

Other Information

  • No code in the darts/ package is touched. Only examples/14-transfer-learning.ipynb and CHANGELOG.md.
  • CHANGELOG entry under Unreleased > Fixed.
  • Maintainer-stated that the notebook isn't part of CI, so adding a unit test wouldn't fit existing conventions. The notebook change is itself the regression artifact: the diff is what unblocks users running the example end-to-end.

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/unit8co/darts.git /tmp/darts-2752 && cd /tmp/darts-2752
uv sync --group dev

# --- inspect the difference (no need to run the full notebook) ---

# BEFORE (origin/master): load_air does NOT call fill_missing_values
git checkout origin/master
grep -n "fill_missing_values" examples/14-transfer-learning.ipynb || echo "NOT PRESENT"
# Expected: "NOT PRESENT" — the helper is neither imported nor called.

# AFTER (this PR): load_air now imports + calls fill_missing_values
git fetch https://github.com/jbbqqf/darts.git fix/2752-transfer-learning-nan
git checkout FETCH_HEAD
grep -n "fill_missing_values" examples/14-transfer-learning.ipynb
# Expected: 2 hits — one in the imports cell, one inside load_air()
#           right after longest_contiguous_slice().

If you want to verify functionally (slow — downloads the dataset and runs the local-models cells):

# AFTER, end-to-end smoke
uv run jupyter nbconvert --to notebook --execute examples/14-transfer-learning.ipynb \
    --output /tmp/14-executed.ipynb --ExecutePreprocessor.timeout=600
# Expected: no NaN-related ValueError; eval_local_model finishes with finite SMAPE values.

What I ran locally

  • Validated the notebook is well-formed JSON and contains 76 cells (unchanged count).
  • Confirmed fill_missing_values import + call are present in the patched notebook.
  • Confirmed the diff is limited to the imports cell and the load_air() cell (plus the CHANGELOG).
  • Did not execute the full notebook locally because it requires downloading the M3 / M4 / carrier_passengers datasets (the same reason maintainers don't run it in CI per [BUG] examples/14-transfer-learning.ipynb error on air_train because of missing values #2752).

Edge cases tested

# Scenario Behavior
1 Air dataset with residual NaN-bearing series fill_missing_values(fill="auto") interpolates the gaps before scaling, so downstream models receive finite values
2 Air dataset series that were already gap-free fill_missing_values is a no-op; no behavioral change
3 Series too short or raising in longest_contiguous_slice() Caught by the existing try/except continue block — fix is positioned after that guard

Risk / blast radius

  • Notebook-only change. No runtime code in darts/ is modified.
  • fill_missing_values(..., fill="auto") is a public, documented helper used elsewhere in the codebase, so the new dependency is benign.

PR drafted with assistance from Claude Code (Anthropic). The change was reviewed manually against unit8co/darts@master, the discussion in #2752, and the proposed fix from the original reporter. Reviewer can paste the BEFORE/AFTER block above to verify.

…co#2752)

The transfer-learning notebook fails on the air dataset because the raw
carrier_passengers.csv contains gaps that survive longest_contiguous_slice()
(e.g. zero-rows that the slice helper does not treat as missing). The
resulting NaNs propagate into model training and produce SMAPE = NaN
errors, as reported in unit8co#2752.

Fix: call fill_missing_values(series, fill="auto") inside load_air()
right after longest_contiguous_slice(), with an inline comment pointing
at the issue so a future reader doesn't have to dig through history.
Imports the helper from darts.utils.missing_values.

Maintainer @dennisbader acknowledged in unit8co#2752 that the notebook needs
refactoring (it's one of the only ones that isn't tested in CI because
it's slow); this is the smallest possible change that unblocks users
running the example end-to-end.

CHANGELOG entry under Unreleased > Fixed.

Refs: unit8co#2752

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jbbqqf jbbqqf requested a review from dennisbader as a code owner May 9, 2026 22:35
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown
Collaborator

@jakubchlapek jakubchlapek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jbbqqf, thanks for this PR :)

Looks good, small change requested.
What would be really nice for us for this PR is also finding out why this issue was raising an error in the first place (if it was the data that changed or a new interaction with code). Would you be willing to check that out as part of this pull request? :) Thanks

Comment thread CHANGELOG.md

**Fixed**

- Fixed `examples/14-transfer-learning.ipynb` raising a NaN error on the `air` dataset: the `load_air()` helper now calls `fill_missing_values(..., fill="auto")` after `longest_contiguous_slice()` to handle residual gaps in the raw `carrier_passengers.csv`. Closes [#2752](https://github.com/unit8co/darts/issues/2752).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, refer to the PR number and link here :) also if you want take credit (by ...)

Suggested change
- Fixed `examples/14-transfer-learning.ipynb` raising a NaN error on the `air` dataset: the `load_air()` helper now calls `fill_missing_values(..., fill="auto")` after `longest_contiguous_slice()` to handle residual gaps in the raw `carrier_passengers.csv`. Closes [#2752](https://github.com/unit8co/darts/issues/2752).
- Fixed `examples/14-transfer-learning.ipynb` raising a NaN error on the `air` dataset by calling `fill_missing_values(..., fill="auto")` to handle residual gaps in the raw `carrier_passengers.csv`. Closes [#3116](https://github.com/unit8co/darts/pull/3116).

Comment on lines +251 to +252
" # downstream and break model training. See issue #2752.\n",
" series = fill_missing_values(series, fill=\"auto\")\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit too verbose compared to the rest of the notebook
should be fine with just

Suggested change
" # downstream and break model training. See issue #2752.\n",
" series = fill_missing_values(series, fill=\"auto\")\n",
" # fill missing values\n",
" series = fill_missing_values(series, fill=\"auto\")\n",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] examples/14-transfer-learning.ipynb error on air_train because of missing values

2 participants